Skip to content

Conversation

@emilazy
Copy link
Member

@emilazy emilazy commented Dec 7, 2025

The activity endpoint does not work with the merge queue, and counts things like branch creations that we don’t really want to include.

This finds two‐parent merge commits with the correct author committed and signed by the GitHub key that are reachable from the main branch; I believe that PR merges are the only such commits, but any exceptions should be very marginal.

Closes: #69


This is untested, because I don’t have access to the test org. I did make sure that the gh(1) calls seem to work, though.

@emilazy
Copy link
Member Author

emilazy commented Dec 7, 2025

I have now tested this locally, after some additional fixes I’ve pushed, without PROD=1. It seems to work fine – it does in fact want to nominate a handful of committers who have only made squash/rebase merges before the merge queue was enabled, but a spot check shows that these are not far from the inactivity cut‐off and only marginally active anyway, so I think we can handle those manually rather than increasing code complexity here.

It does also try to retire @lf-, who the GitHub API seems to hate (you can see them special‐cased in the old RFC 55 tool too). I’d rather report that via our GitHub point of contact and see if we can get it fixed, or else just manually close the retirement PR once a year, than add hacks everywhere we use the API.

So I think this should be ready.

@Pandapip1
Copy link
Member

Pandapip1 commented Dec 7, 2025

It does also try to retire @lf-, who the GitHub API seems to hate (you can see them special‐cased in the old RFC 55 tool too).

Is this because their activity is set to private? If so, we might just want to look at the git history rather than going through the GH API.

@emilazy
Copy link
Member Author

emilazy commented Dec 7, 2025

I don’t think so, because it doesn’t also try to retire @wolfgangwalther.

Copy link
Contributor

@MattSturgeon MattSturgeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM overall, minor nits + observations:

cat "$tmp/$login"
while read -r sha; do
gh api "/repos/$ORG/$ACTIVITY_REPO/commits/$sha/pulls" --jq '.[] |
select(.merge_commit_sha == "'"$sha"'") |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general rule, embedding raw values directly into code expressions (of any language) is something to avoid. Here it happens to be safe because the value is always a hex SHA that won't include quotes.

With standalone jq we could pass it as --arg sha "$sha", but gh --jq doesn't support that so embedding is acceptable.

TL;DR: just a minor observation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I hate doing this. I spent a minute or so trying to think of a way to avoid it, and then decided that I’d rather rewrite it in GitHub Script if I’m going down that road anyway.

That said, given your point about the merge queue not allowing altering the commit message (which I had forgotten), I believe we could skip this loop and its API calls entirely and simply parse the PR numbers out from the commits. I can make that modification if you think it’s worthwhile. I am not sure this section of code has ever actually been used, though, since it only triggers when someone is active after the retirement PR is opened, before it the month passes, and before it’s manually closed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up doing this to save the API calls – though we have to inject the repository prefix anyway, and the script was previously doing something more dangerous there! But I just did it through the environment instead, so it’s all good. At least, as good as Bash can be.

# Get the unix epoch of the first commit that touched this file
# --first-parent is important to get the time of when the main branch was changed
fileCommitEpoch=$(git log --reverse --first-parent --format=%cd --date=unix -- "$login" | head -1)
fileCommitEpoch=$(git log --reverse --first-parent --format=%cd --date=unix -- "$login" | head -1 || true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using || true here effectively hides any error from git log. The intent is to handle the BSD vs GNU head -1 discrepancy, an alternative that avoids masking real errors would be to use sed:

Suggested change
fileCommitEpoch=$(git log --reverse --first-parent --format=%cd --date=unix -- "$login" | head -1 || true)
fileCommitEpoch=$(git log --reverse --first-parent --format=%cd --date=unix -- "$login" | sed -n 1p)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is with GNU head(1), actually; the script has a dependency on GNU coreutils for date --date=…. It reproduces with both macOS and GNU head(1), and… also reproduces on Linux:

[emily@build01:~/nixpkgs]$ bash -c 'set -euo pipefail; foo=$(git log --reverse --first-parent --format=%cd --date=unix -- default.nix | head -1); echo hi'

[emily@build01:~/nixpkgs]$ 

So… I have no idea why it’s been working in CI. Maybe something to do with the Git configuration of the GHA runners, or terminal session signal handling, or something.

I believe sed -n 1p will let the git log command run to completion, so it’s less efficient in repositories with long histories. That shouldn’t matter here, of course.

However, this logic is actually incorrect when a committer is retired and then re‐added! It will find the first addition of the file, rather than the latest. So I’ve changed it to get the relevant information directly from Git instead.

-f time_period=year \
-f actor="$login" \
trace gh api -X GET /repos/"$ORG"/"$ACTIVITY_REPO"/commits \
-f since="$(date --date='1 year ago' --iso-8601=seconds)Z" \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--iso-8601=seconds already appends a timezone designator:

  • GNU date ends with +00:00
  • BSD date ends with Z

Manually adding another Z will therefore produce invalid timestamps such as 2025-01-01T12:00:00+00:00Z, so we should drop the extra suffix:

Suggested change
-f since="$(date --date='1 year ago' --iso-8601=seconds)Z" \
-f since="$(date --date='1 year ago' --iso-8601=seconds)" \

If you want full cross-platform consistency, using -u with a custom format avoids the GNU/BSD differences entirely:

Suggested change
-f since="$(date --date='1 year ago' --iso-8601=seconds)Z" \
-f since="$(date --date='1 year ago' -u +%Y-%m-%dT%H:%M:%SZ)" \

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Darwin date(1) doesn’t support --iso-8601 at all. I missed the time zone designator in the output and was trying to be paranoid in matching the documented format. However, it seems like the API parses numeric time zones correctly here, so I’ve just removed the Z as suggested.

echo "One month has passed, and @$login has been active again:"
cat "$tmp/$login"
while read -r sha; do
gh api "/repos/$ORG/$ACTIVITY_REPO/commits/$sha/pulls" --jq '.[] |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small FYI: gh api also supports {} path templates together with -f fields, which can help separate variable data from the path and avoid quoting/escaping pitfalls.

For example, this call could be written as:

gh api -X GET /repos/{org}/{repo}/commits/{sha}/pulls \
  -f org="$ORG" \
  -f repo="$ACTIVITY_REPO" \
  -f sha="$sha" \
  --jq '
    ...
  '

But the current style matches the rest of the file and is likely clearer for most readers, so no change needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I do prefer that. But I’d prefer a proper programming language more, and it doesn’t seem worth refactoring the other calls for right now :)

It seems like we could also use -F for fancier type‐checking of fields (?).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we could also use -F for fancier type‐checking of fields (?).

Yeah, idk if it makes much difference for GET requests that use query params, but yes -F signals to gh that the value is a string. I think in some cases that'll lead to different escape sequences or quoting, though in practice, -f and -F seem to produce similar results.

Otherwise, a committer that is removed and then added again may be
proposed for retirement within the first year of their new tenure.

This also fixes an interaction between `-o pipefail` and `SIGPIPE`
that seems to reproduce in local tests on both Darwin and Linux for
me, but somehow hasn’t been affecting CI.
@emilazy
Copy link
Member Author

emilazy commented Dec 8, 2025

I’ve checked that the script still seems to be working correctly after my latest changes.

FWIW, another potential failure mode here is if someone makes 100 non‐PR‐merge commits with a @web-flow committer (web edits, reverts, etc.) after their merge, even if that was less than a year ago. However I of course don’t take that one very seriously. Ultimately it’s not worth spending more time on the automation than it will save in triage.

Copy link
Member

@infinisil infinisil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if the issue with the username @lf- is fixed, the script would still retire these additional committers who have merged something in the past year before merge queue's and without a merge commit:

In addition, there may be many more users who would be retired much earlier throughout the next year just for the same reason.

We should combine the commit query with the activity one from before, so that either counts.

Furthermore, we should make sure that the manual tests at least keep succeeding, they don't right now: https://github.com/NixOS/nixpkgs-committers/tree/main/scripts#testing-nominationsh, or even better, extend them to also test merge queues.

Comment on lines +112 to 116
trace gh api -X GET /repos/"$ORG"/"$ACTIVITY_REPO"/commits \
-f since="$(date --date='1 year ago' --iso-8601=seconds)" \
-f author="$login" \
-f committer=web-flow \
-f per_page=100 \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can avoid the issues with @lf- by using the GraphQL API:

#!/usr/bin/env bash
set -euo pipefail

user=$1

id=$(gh api graphql -f query='query($user: String!) { user(login: $user) { id } })' -f user="$user" --jq '.data.user.id')

if ! result=$(gh api graphql --paginate -f query="$(<commits.graphql)" -f id="$id" -f since="$(date -Iseconds --date='1 year ago')"); then
  echo "$result"
else
  jq . > result.json <<< "$result"
fi

gh api graphql --paginate -f query='
query($id: ID!, $since: GitTimestamp!, $endCursor: String) {
  repository(owner: "NixOS", name: "nixpkgs") {
    ref(qualifiedName: "master") {
      target {
        ... on Commit {
          history(since: $since, first: 100, author: { id: $id }, after: $endCursor) {
            nodes {
              parents(first: 2) {
                nodes {
                  oid
                }
              }
              signature {
                wasSignedByGitHub
              }
              committedDate
              url
            }
            pageInfo {
              hasNextPage
              endCursor
            }
          }
        }
      }
    }
  }
}' -f id="$id" -f since="$(date -Iseconds --date='1 year ago')" \
  --jq '.data.repository.ref.target.history.nodes[] | select((.parents.nodes | length > 1) and .signature.wasSignedByGitHub) | pick(.url, .committedDate)'

I'm out of time myself for now, but I can integrate this into the PR later.

Fyi: The gh CLI supports pagination both for the REST and GraphQL API's ;)

Copy link
Member Author

@emilazy emilazy Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this isn’t the first time automation has had to special‐case @lf-, and some things are exposed only through the REST API, I’d much rather use our connection to GitHub to try and get this API bug fixed before considering increasing the code complexity like this.

I deliberately skipped pagination, since it’s extremely unlikely for someone to have 100 @web-flow commits through after their last PR merge, and it’d make the script run much slower in the presence of highly‐active committers. This GraphQL version makes that case much more likely, since it will pick up every commit authored by a committer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, to be clear, when Emily iterated on this, we discussed things - and she already wrote a very similar GraphQL query, but the problem there is that it is impossible to filter by committer via GraphQL.

Copy link
Member Author

@emilazy emilazy Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does filter after the fact by the GitHub signature, which should correctly handle that, FWIW – but it also means having to enumerate every authored commit in the past year (since 100 non‐web commits authored after the last merge is much more plausible), which for active committers I think would significantly slow the script down. The commits endpoint is already somewhat slower than the activity one. So I don’t think this is a good idea just to avoid a bug we can report to GitHub that adds 1 PR/year additional triage load, even before code complexity is taken into account.

Copy link
Member

@lf- lf- Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reported this bug to GitHub and they told me they don't care to fix it. It's that they're using the current username regex while evaluating searches rather than one which matches their historical username requirements, and it's probably a 1-line fix. Their reply didn't really make sense to me as to why they didn't want to fix it.

You can try again if you'd like.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it’d make the script run much slower in the presence of highly‐active committers

We can stop the pagination once any valid merge has been found, which should always just need one page for active committers.

While I'm happy to maintain this repo even with added complexity to get it work nicely, it really does seem like this is a one-off case, especially considering what @lf- mentioned, so I guess this is fine.

Copy link
Member Author

@emilazy emilazy Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lf- Aha, interesting. I’m not sure if that’s the only thing going on here, though – Wolfgang pointed out that he has seen it happen with @mbrgm too, which presumably doesn’t have any name validation issues. Maybe the regex is for search endpoints and whatever is going on with the API here is something else – or maybe the issue with @mbrgm is unrelated (it seems they have some commits in master, but they don’t show up in /commits?author=… in the web interface either, whereas yours do).

I certainly think it’s worth asking GitHub about, since we have another call with them scheduled in a week anyway.

@infinisil

I tried the GraphQL snippet here with @fabaff, and it took almost 5 minutes for that one committer, so I think this approach is certainly a non‐starter without some kind of short‐circuiting. However there are subtleties to that too, when considering e.g. the “user has become inactive again” logic.

What would probably be more viable is doing something like git clone --bare --single-branch --no-tags --filter=tree:0 https://github.com/NixOS/nixpkgs.git, git log --first-parent to aggregate commits from the last year and filter by author (if the committer is GitHub) and otherwise committer, group by email address, take the first commit for each, and then use the API to map those commits to usernames; whatever is left are inactive committers. But I believe the current state of the PR is really fine and think it would not be a good return‐on‐investment to work on that.

@emilazy
Copy link
Member Author

emilazy commented Dec 8, 2025

Even if the issue with the username @lf- is fixed, the script would still retire these additional committers who have merged something in the past year before merge queue's and without a merge commit:

Yes, I addressed this in my earlier comment:

I have now tested this locally, after some additional fixes I’ve pushed, without PROD=1. It seems to work fine – it does in fact want to nominate a handful of committers who have only made squash/rebase merges before the merge queue was enabled, but a spot check shows that these are not far from the inactivity cut‐off and only marginally active anyway, so I think we can handle those manually rather than increasing code complexity here.

I believe it’s fine for us to manually triage these. Two of these would be proposed for retirement next month regardless, and one in three months; it seems fine to ping them and see if they still intend to be active. The others we could close to postpone for a year, or leave open until they would otherwise be proposed for retirement. The script will close these PRs if any of them become active again.

I think it is better to err on the side of counting fewer things than counting more, for the same reason that we don’t try to capture non‐PR‐merge committer actions. Most of these committers are only very minimally active, and can be handled similarly to how we would handle committers that are only using their bit to merge backports, or to create or delete branches, or to do manual periodic merge conflict resolution, or to push to others’ PR branches.

The inactivity criteria are already very lenient, and the retirement PRs are subject to manual approval by the team; since we established that the activity endpoint counts too much and have already manually retired one user as a result, I’m opposed to increasing the code complexity here for these temporary edge cases (especially while it’s written in Bash).

In addition, there may be many more users who would be retired much earlier throughout the next year just for the same reason.

We enabled the merge queue for master on 2025-09-10. I checked what the results would be on 2026-09-10, assuming no more merges happen since then. There are only two further users that would be proposed for retirement early before then – @timokau less than three months early, and @r-burns less than a month early – and of course, that will only happen if none of them have any further activity. I don’t think this is an issue and would prefer to land this as‐is.

@emilazy
Copy link
Member Author

emilazy commented Dec 8, 2025

Furthermore, we should make sure that the manual tests at least keep succeeding, they don't right now: https://github.com/NixOS/nixpkgs-committers/tree/main/scripts#testing-nominationsh, or even better, extend them to also test merge queues.

Could you explain what is failing? I tested the retirement script locally on NixOS/nixpkgs (with a non‐privileged access token and skipping PROD=1) and it behaved correctly. This links to the section for testing nomination.sh, which isn’t touched here – is the scripts/common.sh change leading to some issue with timestamps or something?

I can try to get the test environment set up to address this if it’s something that this PR causes a regression in, but it would be good to check these things in CI if they’re load-bearing.

@wolfgangwalther
Copy link
Contributor

The inactivity criteria are already very lenient, and the retirement PRs are subject to manual approval by the team; since we established that the activity endpoint counts too much and have already manually retired one user as a result

I agree with that. It's much less effort to double check retirement PRs that are created even though a few squash merges happened, compared to checking all committers manually for these outliers, that automation didn't catch. I can say that now, because I did just check all committers (and I even ran scripts across all maintainers for activity).

Also, I personally strongly believe that "zero activity" is a good threshold for automation, but "very low activity" is enough of a reason for removal already - especially when the committer in question will not come back to the PR within a full month. We gain almost nothing as the Nixpkgs project by having 10 committers on board, who each merge 1-2 PRs a year. I even looked at these things and among these committers, chances for self-merge, often unreviewed, is very high - these committers just come back to Nixpkgs to serve their own itch, but are not actively contributing in reviewing others' PRs - and that's what a committer is about. They don't need the commit bit for that, they can just open a PR like other contributors, too and have somebody else, who will be much more familiar with current contribution guidelines and so on, do the merge.

Copy link
Contributor

@wolfgangwalther wolfgangwalther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still LGTM

@infinisil
Copy link
Member

This links to the section for testing nomination.sh, which isn’t touched here

Meant to link to the section further down: https://github.com/NixOS/nixpkgs-committers/tree/main/scripts#testing-retiresh

Comment on lines -104 to -106
trace gh api -X GET /repos/"$ORG"/"$ACTIVITY_REPO"/activity \
-f time_period=year \
-f actor="$login" \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding feedback for GitHub, it seems like what's really missing is that the activity endpoint should keep track of merge queue additions. If we had that, we wouldn't need any changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge queue counting as activity would be good too and probably something we should raise, but I’m guessing it’d be a less trivial fix (it seems like much of the merge queue stuff happens through an internal bot account) – and we’d still need to make adjustments here to filter out non‐PR‐merging activity, so I’m not sure it’d be a net reduction in complexity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't know how GitHub's backend works, so I don't think we can make realistic estimates for complexities of changes, neither for the username regex nor for the activity endpoint. What we can do is suggest multiple potential solutions to them, and see if they can do any of them.

For filtering the non-PR-merging activity, it would just be select(.activity_type == ...) in the jq.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the merge queue events are visible in the activity results - just for the web-flow or other bot users. So it'd be a change that doesn't make any semantic sense for them: The user that pushes the head of the temporary merge queue branch to the target branch is the bot.

The right thing to do would be to look at the events where the PR is added to the merge queue. But these are not repository events, they are PR events. We can already fetch them for each PR, but we won't get them via the activity endpoint.

TLDR: I don't even think there is a reasonable change for GitHub to make - everything works as it should.

@emilazy
Copy link
Member Author

emilazy commented Dec 8, 2025

Meant to link to the section further down: https://github.com/NixOS/nixpkgs-committers/tree/main/scripts#testing-retiresh

As I said, can you explain the issue?

“Check that no PR would be opened” on step 1 doesn’t work for me, because I don’t have any activity in the test org, but that’s the same before this PR too.

Step 2 gives gh: Git Repository is empty. (HTTP 409) – if that’s what you mean, then it seems like the simplest fix would be to make sure the empty repository has a single dummy commit rather than complicating the script for a case that will never happen in reality. Git repositories with no commits behave pretty weirdly in general.

@infinisil
Copy link
Member

infinisil commented Dec 8, 2025

“Check that no PR would be opened” on step 1 doesn’t work for me, because I don’t have any activity in the test org, but that’s the same before this PR too.

https://github.com/NixOS/nixpkgs-committers/tree/main/scripts#setup

@infinisil
Copy link
Member

infinisil commented Dec 9, 2025

Step 2 gives gh: Git Repository is empty. (HTTP 409) – if that’s what you mean, then it seems like the simplest fix would be to make sure the empty repository has a single dummy commit rather than complicating the script for a case that will never happen in reality. Git repositories with no commits behave pretty weirdly in general.

Yeah so that's effectively a test failure. Feel free to adjust the test org repos so that it works with the new script. Makes sense that this wouldn't have been required for the activity endpoint.

The activity endpoint does not work with the merge queue, and counts
things like branch creations that we don’t really want to include.

This finds two‐parent merge commits with the correct author committed
and signed by the GitHub key that are reachable from the main branch;
I believe that PR merges are the only such commits, but any exceptions
should be very marginal.

Closes: NixOS#69
@emilazy
Copy link
Member Author

emilazy commented Dec 9, 2025

The tests are now passing.

Copy link
Member

@infinisil infinisil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM then!

@infinisil infinisil merged commit dbaac1b into NixOS:main Dec 9, 2025
1 check passed
@infinisil
Copy link
Member

I reenabled the workflow

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nixpkgs-core-team-update-2025-12-20/73494/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Automation to retire committers does not recognize merge queue merges

8 participants